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A user specifies a hierarchical action tree via user input device and user interface 
element. The action tree is arranged in a tree of file directories, with each node 
of the tree corresponding to a file directory (or path) . The user then specifies 
classes of patterns assigned to each node (directory) of the tree using data mining 
queries or pattern templates. Once the system is so initialized, the pattern 
templates and data mining queries are executed, retrieving the patterns specified by 
the user from a database. The retrieved patterns assigned to a node of the tree are 
then stored in a file in the corresponding file directory. The user may now act on 
the discovered patterns and use the organized file structure. A pattern discovery 
optimization element periodically checks if the database has changed substantially, 
and if it has re-executes the data mining queries and pattern templates which update 
the contents of the file structure accordingly. 
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ABSTRACT : 

A database query system includes a query assistant that permits the user to enter 
only queries that are both syntactically and semantically valid (and that can be 
processed by an SQL generator to produce semantically valid SQL) . Through the use of 
dialog boxes, a user enters a query in an intermediate English- like language which 
is easily understood by the user. A query expert system monitors the query as it is 
being built, and using information about the structure of the database, it prevents 
the user from building semantically incorrect queries by disallowing choices in the 
dialog boxes which would create incorrect queries. An SQL generator is also provided 
which uses a set of transformations and pattern substitutions to convert the 
intermediate language into a syntactically and semantically correct SQL query. 

The intermediate language can represent complex SQL queries while at the same time 
being easy to understand. The intermediate language is also designed to be easily 
converted into SQL queries. In addition to the query assistant and the SQL 
generator, an administrative facility is provided which allows an administrator to 
add a conceptual layer to the underlying database making it easier for the user to 
query the database. This conceptual layer may contain alternate names for columns 
and tables, paths specifying standard and complex joins, definitions for virtual 
tables and columns, and limitations on user access. 
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ABSTRACT : 

A method for constructing a data structure for a data string of characters includes 
producing a matrix of sorted rotations of the data string. This matrix defines an A 
array which is a sorted list of the characters in the data string, a B array which 
is a permutation of the data string, and a correspondence array C which contains 
correspondence entries linking the characters in the A array to the same characters 
in the B array. A reduced A' array is computed to identify each unique character in 
the A array and a reduced C array is computed to contain every s.sup.th entry of 
the C array. The B array is segmented into blocks of size s. During a search, the A' 
and C arrays are used to index the B array to reconstruct any desired row from the 
matrix of rotations. Through this representation, the matrix of rotations can thus 
be used as a conventional sorted list for pattern matching or information retrieval 
applications. A data structure containing only the A', B, and C has very little 
memory overhead. The B array contains the same number of characters as the original 
data string, and can be compressed in a block wise manner to reduce its size. The A' 
array is a fixed size equal to the size of the alphabet used to construct the data 
string, and the C f array is variable size according to the relationship n/s, where n 
is the number of characters in the data string and s is the size of the blocks of 
the B array. Accordingly, the data structure enables a tradeoff between access speed 
and memory overhead, the product of which is constant with respect to block size s. 
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ART-UNIT: 277 

PRIMARY- EXAMINER : Homere; Jean R. 



The invention concerns a method and apparatus for generating a tabulation of counts 
of occurrences of value combinations of a set of attributes over a relation 
consisting of a set of database records. The gathered counts (also referred to as 
sufficient statistics) of attribute occurrences or correlation counts is most 
preferably used in building a classification or density estimation model from the 
database records that can be used to predict some attribute values based on other 
attribute values. A new SQL operator designated the "UNPIVOT" operator operates by 
scanning the database records and for each record reorganizes that data to form an 
UNPIVOTED data record that include the combinations of attribute name, attribute 
value and the values for one or more selected class attributes. The UNPIVOTED table 
can be used to produce the desired sufficient statistics in one scan of the data 
using standard database engines. While materialization of UNPIVOTED table would 
cause a large added scan cost overhead, the UNPIVOT operator allows us to achieve 
the counts without the added cost by combining the UNPIVOT operator with other SQL 
"select" and "group by" operators the UNPIVOTED table can be counted without the 
need for materializing it. The result is a guaranteed one pass algorithm that does 
not incur the added scan cost factor. The savings in scan cost can extend to several 
orders of magnitude compared to other methodologies for getting the counts supported 
by current database engines. The sufficient statistics so gathered can be used to 
drive a variety of data mining algorithms. 
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PRIMARY-EXAMINER: Borin; Michael 



ABSTRACT : 

A method and system for detecting coincidences in a data set of objects, where each 
object has a number of attributes. Iteratively, equally-sized subsets of the data 
set of sampled, and coincidences (co-occurrences of a plurality of attribute values 
in one or more objects in the subset) are recorded. For each coincidence of 
interest, the expected coincidence count is determined and compared with the 
observed coincidence count; this comparison is used to determine a measure of 
correlation for the plurality of attributes for the coincidence. The resulting set 
of k-tuples of correlated attributes is reported, a k-tuple of correlated attributes 
being a plurality of attributes for which the measure of correlation is above a 
predetermined threshold. The method and system (implemented on an array of 
processing nodes) is suitable for protein structure analysis, e.g. in HIV research. 

39 Claims, 22 Drawing figures 
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A system and method for^Rnputer modeling (10) and for caking hyperstructures (51) 
which are to be contained in a computer memory, which obtains measurements of 
physical objects and activities which are related to the entity to be modeled in the 
computer hyperstructure (51) . The measurements are transformed into computer data 
which corresponds to the physical objects and activities external to the computer 
system (10) . A plurality of independent dimensions (54) are created, where each 
dimension (54) includes at least one element (58) . A plurality of cells (56) are 
created, each of which is associated with the intersection of two or more elements 
(58), each cell (56) being capable of storing at least one value. At least one rule 
domain (60) is associated with at least one cell (56) , the rule domain (60) 
including at least one rule for assigning values to the associated cells (56) . A 
domain modeling rule set (12 6) is prepared (300) , which determines which of the 
rules will provide the value associated with each of the cells (56) wherein 
application of the domain modeling rule set (126) to the hyperstructure (51) causes 
a physical transformation of the data corresponding to said physical objects which 
are modeled in said hyperstructure (51) . 

Also disclosed is a method for querying computer hyperstructures (51) , a 
Hyperstructure Query Language, and a "cell explorer", which allows direct viewing of 
the applied formulas that produce a specific value for a cell (56) . 

18 Claims, 17 Drawing figures 
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